How does removing the low samples (Day 8 and CFUs around 0) affect prediction?
Removing these data points improves predictability for day 6 and 8 but not the earlier days. If I increase the cutoff to remove all CFU below 4, there is still no improvement and even in some cases decreased performance.
Otu00004 Otu00005 Otu00015 Otu00019 Otu00030 Otu00199 Otu00200 Otu00250
14 17 13 15 13 16 17 19
Boruta Confirmed the following OTUs as important for predicting day 9/10 cfu:
Otu00004 Otu00005 Otu00015 Otu00030 Otu00081 Otu00199 Otu00200 Otu00250 Otu00297
21 22 21 21 21 22 22 22 20
Selecting OTUs through collecting the features from the most predictive community/cfu models (R^2 >= 0.6 and MSE <= 0.8), then converting all % Increase in MSE to relative values and taking the median value of of each OTU, then selecting OTUs that fall above the median value results in the following OTUs:
“Otu00001” “Otu00002” “Otu00004” “Otu00005” “Otu00012” “Otu00014” “Otu00015” “Otu00016” “Otu00019” “Otu00028” “Otu00030” “Otu00048” “Otu00081” “Otu00101” “Otu00118” “Otu00199” “Otu00200” “Otu00250” “Otu00297”